Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
Theor Popul Biol ; 154: 27-39, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37544486

RESUMO

Recombination is a powerful evolutionary process that shapes the genetic diversity observed in the populations of many species. Reconstructing genealogies in the presence of recombination from sequencing data is a very challenging problem, as this relies on mutations having occurred on the correct lineages in order to detect the recombination and resolve the ordering of coalescence events in the local trees. We investigate the probability of reconstructing the true topology of ancestral recombination graphs (ARGs) under the coalescent with recombination and gene conversion. We explore how sample size and mutation rate affect the inherent uncertainty in reconstructed ARGs, which sheds light on the theoretical limitations of ARG reconstruction methods. We illustrate our results using estimates of evolutionary rates for several organisms; in particular, we find that for parameter values that are realistic for SARS-CoV-2, the probability of reconstructing genealogies that are close to the truth is low.


Assuntos
Algoritmos , Recombinação Genética , Modelos Genéticos , Mutação , Evolução Biológica , Filogenia
2.
Front Public Health ; 11: 1019223, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36908465

RESUMO

Background: Mandatory COVID-19 certification, showing proof of vaccination, negative test, or recent infection to access to public venues, was introduced at different times in the four countries of the UK. We aim to study its effects on the incidence of cases and hospital admissions. Methods: We performed Negative binomial segmented regression and ARIMA analyses for four countries (England, Northern Ireland, Scotland and Wales), and fitted Difference-in-Differences models to compare the latter three to England, as a negative control group, since it was the last country where COVID-19 certification was introduced. The main outcome was the weekly averaged incidence of COVID-19 cases and hospital admissions. Results: COVID-19 certification led to a decrease in the incidence of cases and hospital admissions in Northern Ireland, as well as in Wales during the second half of November. The same was seen for hospital admissions in Wales and Scotland during October. In Wales the incidence rate of cases in October already had a decreasing tendency, as well as in England, hence a particular impact of COVID-19 certification was less obvious. Method assumptions for the Difference-in-Differences analysis did not hold for Scotland. Additional NBSR and ARIMA models suggest similar results, while also accounting for correlation in the latter. The assessment of the effect in England itself leads one to believe that this intervention might not be strong enough for the Omicron variant, which was prevalent at the time of introduction of COVID-19 certification in the country. Conclusions: Mandatory COVID-19 certification reduced COVID-19 transmission and hospitalizations when Delta predominated in the UK, but lost efficacy when Omicron became the most common variant.


Assuntos
Vacinas contra COVID-19 , COVID-19 , Humanos , Reino Unido/epidemiologia , Hospitalização , COVID-19/epidemiologia , COVID-19/prevenção & controle , Vacinação , Vacinas contra COVID-19/administração & dosagem , SARS-CoV-2 , Incidência , Programas Obrigatórios
3.
PLoS Comput Biol ; 18(6): e1009414, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35731801

RESUMO

Gene expression is controlled by pathways of regulatory factors often involving the activity of protein kinases on transcription factor proteins. Despite this well established mechanism, the number of well described pathways that include the regulatory role of protein kinases on transcription factors is surprisingly scarce in eukaryotes. To address this, PhosTF was developed to infer functional regulatory interactions and pathways in both simulated and real biological networks, based on linear cyclic causal models with latent variables. GeneNetWeaverPhos, an extension of GeneNetWeaver, was developed to allow the simulation of perturbations in known networks that included the activity of protein kinases and phosphatases on gene regulation. Over 2000 genome-wide gene expression profiles, where the loss or gain of regulatory genes could be observed to perturb gene regulation, were then used to infer the existence of regulatory interactions, and their mode of regulation in the budding yeast Saccharomyces cerevisiae. Despite the additional complexity, our inference performed comparably to the best methods that inferred transcription factor regulation assessed in the DREAM4 challenge on similar simulated networks. Inference on integrated genome-scale data sets for yeast identified ∼ 8800 protein kinase/phosphatase-transcription factor interactions and ∼ 6500 interactions among protein kinases and/or phosphatases. Both types of regulatory predictions captured statistically significant numbers of known interactions of their type. Surprisingly, kinases and phosphatases regulated transcription factors by a negative mode or regulation (deactivation) in over 70% of the predictions.


Assuntos
Monoéster Fosfórico Hidrolases , Proteínas Quinases , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Monoéster Fosfórico Hidrolases/genética , Monoéster Fosfórico Hidrolases/metabolismo , Proteínas Quinases/genética , Proteínas Quinases/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
4.
PLoS Biol ; 20(6): e3001626, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35658016

RESUMO

The evolution of cooperation in cellular groups is threatened by lineages of cheaters that proliferate at the expense of the group. These cell lineages occur within microbial communities, and multicellular organisms in the form of tumours and cancer. In contrast to an earlier study, here we show how the evolution of pleiotropic genetic architectures-which link the expression of cooperative and private traits-can protect against cheater lineages and allow cooperation to evolve. We develop an age-structured model of cellular groups and show that cooperation breaks down more slowly within groups that tie expression to a private trait than in groups that do not. We then show that this results in group selection for pleiotropy, which strongly promotes cooperation by limiting the emergence of cheater lineages. These results predict that pleiotropy will rapidly evolve, so long as groups persist long enough for cheater lineages to threaten cooperation. Our results hold when pleiotropic links can be undermined by mutations, when pleiotropy is itself costly, and in mixed-genotype groups such as those that occur in microbes. Finally, we consider features of multicellular organisms-a germ line and delayed reproductive maturity-and show that pleiotropy is again predicted to be important for maintaining cooperation. The study of cancer in multicellular organisms provides the best evidence for pleiotropic constraints, where abberant cell proliferation is linked to apoptosis, senescence, and terminal differentiation. Alongside development from a single cell, we propose that the evolution of pleiotropic constraints has been critical for cooperation in many cellular groups.


Assuntos
Evolução Biológica , Microbiota , Genótipo , Mutação , Fenótipo
5.
Mol Biol Evol ; 39(2)2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35106601

RESUMO

The evolutionary process of genetic recombination has the potential to rapidly change the properties of a viral pathogen, and its presence is a crucial factor to consider in the development of treatments and vaccines. It can also significantly affect the results of phylogenetic analyses and the inference of evolutionary rates. The detection of recombination from samples of sequencing data is a very challenging problem and is further complicated for SARS-CoV-2 by its relatively slow accumulation of genetic diversity. The extent to which recombination is ongoing for SARS-CoV-2 is not yet resolved. To address this, we use a parsimony-based method to reconstruct possible genealogical histories for samples of SARS-CoV-2 sequences, which enables us to pinpoint specific recombination events that could have generated the data. We propose a statistical framework for disentangling the effects of recurrent mutation from recombination in the history of a sample, and hence provide a way of estimating the probability that ongoing recombination is present. We apply this to samples of sequencing data collected in England and South Africa and find evidence of ongoing recombination.


Assuntos
COVID-19 , SARS-CoV-2 , Genoma Viral , Humanos , Mutação , Filogenia , Recombinação Genética
6.
Bioinformatics ; 37(19): 3277-3284, 2021 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-33970217

RESUMO

MOTIVATION: The reconstruction of possible histories given a sample of genetic data in the presence of recombination and recurrent mutation is a challenging problem, but can provide key insights into the evolution of a population. We present KwARG, which implements a parsimony-based greedy heuristic algorithm for finding plausible genealogical histories (ancestral recombination graphs) that are minimal or near-minimal in the number of posited recombination and mutation events. RESULTS: Given an input dataset of aligned sequences, KwARG outputs a list of possible candidate solutions, each comprising a list of mutation and recombination events that could have generated the dataset; the relative proportion of recombinations and recurrent mutations in a solution can be controlled via specifying a set of 'cost' parameters. We demonstrate that the algorithm performs well when compared against existing methods. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/a-ignatieva/kwarg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
J Chem Inf Model ; 61(4): 1637-1646, 2021 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-33844913

RESUMO

A main challenge in the enumeration of small-molecule chemical spaces for drug design is to quickly and accurately differentiate between possible and impossible molecules. Current approaches for screening enumerated molecules (e.g., 2D heuristics and 3D force fields) have not been able to achieve a balance between accuracy and speed. We have developed a new automated approach for fast and high-quality screening of small molecules, with the following steps: (1) for each molecule in the set, an ensemble of 2D descriptors as feature encoding is computed; (2) on a random small subset, classification (feasible/infeasible) targets via a 3D-based approach are generated; (3) a classification dataset with the computed features and targets is formed and a machine learning model for predicting the 3D approach's decisions is trained; and (4) the trained model is used to screen the remainder of the enumerated set. Our approach is ≈8× (7.96× to 8.84×) faster than screening via 3D simulations without significantly sacrificing accuracy; while compared to 2D-based pruning rules, this approach is more accurate, with better coverage of known feasible molecules. Once the topological features and 3D conformer evaluation methods are established, the process can be fully automated, without any additional chemistry expertise.


Assuntos
Desenho de Fármacos , Aprendizado de Máquina
8.
Theor Popul Biol ; 134: 61-76, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32439294

RESUMO

The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present. We show that a simple, analytic, time rescaling of the RRP provides a straightforward way to derive its inter-event times. The same rescaling characterises other distributions underlying this process, obtained elsewhere in the literature via more cumbersome calculations. We also consider the case of incomplete sampling of the population, in which each leaf of the genealogy is retained with an independent Bernoulli trial with probability ψ, and we show that corresponding results for Bernoulli-sampled RRPs can be derived using time rescaling, for any values of the underlying parameters. A central result is the derivation of a scaling limit as ψ approaches 0, corresponding to the underlying population growing to infinity, using the time rescaling formalism. We show that in this setting, after a linear time rescaling, the event times are the order statistics of n logistic random variables with mode log(1∕ψ); moreover, we show that the inter-event times are approximately exponentially distributed.


Assuntos
Densidade Demográfica , Humanos , Probabilidade
9.
Math Biosci ; 325: 108365, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32360772

RESUMO

A key step in the origin of life is the emergence of a primitive metabolism. This requires the formation of a subset of chemical reactions that is both self-sustaining and collectively autocatalytic. A generic approach to study such processes ('RAF theory') has provided a precise and computationally effective way to address these questions, both on simulated data and in laboratory studies. In this paper, we solve some questions posed in more recent papers concerning the computational complexity of some key questions in RAF theory. In particular, although there is a fast algorithm to determine whether or not a catalytic reaction network contains a subset that is both self-sustaining and autocatalytic (and, if so, find one), determining whether or not sets exist that satisfy certain additional constraints turns out to be NP-hard.


Assuntos
Modelos Biológicos , Origem da Vida , Algoritmos , Biocatálise , Fenômenos Bioquímicos , Evolução Biológica , Catálise , Conceitos Matemáticos , Redes e Vias Metabólicas , Modelos Químicos , Biologia de Sistemas
10.
Mol Biol Evol ; 37(2): 576-592, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31665393

RESUMO

Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here, we introduce a sequence evolution model, MESSI (Modeling the Evolution of Secondary Structure Interactions), that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution while accounting for an unknown secondary structure. MESSI can also use graphics processing unit parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in noncoding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and noncoding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses. A potential biophysical explanation is that GT pairs do not stabilize DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure. We found that estimates of coevolution were more strongly correlated with experimentally determined SHAPE-MaP pairing scores than three nonevolutionary measures of base-pairing covariation. To assist researchers in prioritizing substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, among several uncharacterized top-ranking substructures.


Assuntos
Biologia Computacional/métodos , DNA/química , RNA/química , Pareamento de Bases , DNA/genética , DNA Viral/química , DNA Viral/genética , Evolução Molecular , Modelos Moleculares , RNA/genética , RNA não Traduzido/química , RNA não Traduzido/genética , RNA Viral/química , RNA Viral/genética , Software
11.
Front Immunol ; 10: 1339, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31338090

RESUMO

HERV-H endogenous retroviruses are thought to be essential to stem cell identity in humans. We embrace several decades of HERV-H research in order to relate the transcription of HERV-H loci to their genomic structure. We find that highly transcribed HERV-H loci are younger, more fragmented, and less likely to be present in other primate genomes. We also show that repeats in HERV-H LTRs are correlated to where loci are transcribed: type-I LTRs associate with stem cells while type-II repeats associate with embryonic cells. Our findings are generally in line with what is known about endogenous retrovirus biology but we find that the presence of the zinc finger motif containing region of gag is positively correlated with transcription. This leads us to suggest a possible explanation for why an unusually large proportion of HERV-H loci have been preserved in non-solo-LTR form.


Assuntos
Retrovirus Endógenos/genética , Genoma Viral/genética , Sequências Repetidas Terminais/genética , Animais , Sequência de Bases , Callithrix , Evolução Molecular , Produtos do Gene env/genética , Produtos do Gene gag/genética , Produtos do Gene pol/genética , Genômica , Gorilla gorilla , Humanos , Macaca , Pan troglodytes , Pongo , Alinhamento de Sequência , Células-Tronco/citologia
12.
Syst Biol ; 68(2): 252-266, 2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30239957

RESUMO

Classic alignment algorithms utilize scoring functions which maximize similarity or minimize edit distances. These scoring functions account for both insertion-deletion (indel) and substitution events. In contrast, alignments based on stochastic models aim to explicitly describe the evolutionary dynamics of sequences by inferring relevant probabilistic parameters from input sequences. Despite advances in stochastic modeling during the last two decades, scoring-based methods are still dominant, partially due to slow running times of probabilistic approaches. Alignment inference using stochastic models involves estimating the probability of events, such as the insertion or deletion of a specific number of characters. In this work, we present SimBa-SAl, a simulation-based approach to statistical alignment inference, which relies on an explicit continuous time Markov model for both indels and substitutions. SimBa-SAl has several advantages. First, using simulations, it decouples the estimation of event probabilities from the inference stage, which allows the introduction of accelerations to the alignment inference procedure. Second, it is general and can accommodate various stochastic models of indel formation. Finally, it allows computing the maximum-likelihood alignment, the probability of a given pair of sequences integrated over all possible alignments, and sampling alternative alignments according to their probability. We first show that SimBa-SAl allows accurate estimation of parameters of the long-indel model previously developed by Miklós et al. (2004). We next show that SimBa-SAl is more accurate than previously developed pairwise alignment algorithms, when analyzing simulated as well as empirical data sets. Finally, we study the goodness-of-fit of the long-indel and TKF91 models. We show that although the long-indel model fits the data sets better than TKF91, there is still room for improvement concerning the realistic modeling of evolutionary sequence dynamics.


Assuntos
Classificação/métodos , Modelos Estatísticos , Filogenia , Simulação por Computador , Evolução Molecular , Mutação INDEL/genética
13.
Mol Biol Evol ; 34(8): 2085-2100, 2017 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-28453724

RESUMO

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence-structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.


Assuntos
Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Evolução Molecular , Modelos Genéticos , Modelos Moleculares , Conformação Proteica , Elementos Estruturais de Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína/estatística & dados numéricos
14.
Genetics ; 205(4): 1425-1441, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28179367

RESUMO

To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus. These structural variant traits are treated as quantitative traits and mapped genetically, analogously to a gene expression study. Association between a structural variant trait at one locus, and genotypes at a distant locus indicate the origin and target of a transposition. Using ultra-low-coverage (0.3×) population sequence data from 488 recombinant inbred Arabidopsis thaliana genomes, we identified 6502 segregating structural variants. Remarkably, 25% of these were transpositions. While many structural variants cannot be delineated precisely, we validated 83% of 44 predicted transposition breakpoints by polymerase chain reaction. We show that specific structural variants may be causative for quantitative trait loci for germination and resistance to infection by the fungus Albugo laibachii, isolate Nc14. Further we show that the phenotypic heritability attributable to read-mapping anomalies differs from, and, in the case of time to germination and bolting, exceeds that due to standard genetic variation. Genes within structural variants are also more likely to be silenced or dysregulated. This approach complements the prevalent strategy of structural variant discovery in fewer individuals sequenced at high coverage. It is generally applicable to large populations sequenced at low-coverage, and is particularly suited to mapping transpositions.


Assuntos
Arabidopsis/genética , Variação Estrutural do Genoma , Característica Quantitativa Herdável , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/imunologia , Fenótipo , Imunidade Vegetal/genética , Locos de Características Quantitativas
15.
Nat Plants ; 2(11): 16167, 2016 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-27797353

RESUMO

Finding causal relationships between genotypic and phenotypic variation is a key focus of evolutionary biology, human genetics and plant breeding. To identify genome-wide patterns underlying trait diversity, we assembled a high-quality reference genome of Cardamine hirsuta, a close relative of the model plant Arabidopsis thaliana. We combined comparative genome and transcriptome analyses with the experimental tools available in C. hirsuta to investigate gene function and phenotypic diversification. Our findings highlight the prevalent role of transcription factors and tandem gene duplications in morphological evolution. We identified a specific role for the transcriptional regulators PLETHORA5/7 in shaping leaf diversity and link tandem gene duplication with differential gene expression in the explosive seed pod of C. hirsuta. Our work highlights the value of comparative approaches in genetically tractable species to understand the genetic basis for evolutionary change.


Assuntos
Cardamine/genética , Evolução Molecular , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Evolução Biológica , Cardamine/anatomia & histologia , Duplicação Gênica , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
17.
PLoS Comput Biol ; 12(6): e1004964, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27295277

RESUMO

About 8% of the human genome is made up of endogenous retroviruses (ERVs). Though most human endogenous retroviruses (HERVs) are thought to be irrelevant to our biology notable exceptions include members of the HERV-H family that are necessary for the correct functioning of stem cells. ERVs are commonly found in two forms, the full-length proviral form, and the more numerous solo-LTR form, thought to result from homologous recombination events. Here we introduce a phylogenetic framework to study ERV insertion and solo-LTR formation. We then apply the framework to site patterns sampled from a set of long alignments covering six primate genomes. Studying six categories of ERVs we quantitatively recapitulate patterns of insertional activity that are usually described in qualitative terms in the literature. A slowdown in most ERV groups is observed but we suggest that HERV-K activity may have increased in humans since they diverged from chimpanzees. We find that the rate of solo-LTR formation decreases rapidly as a function of ERV age and that an age dependent model of solo-LTR formation describes the history of ERVs more accurately than the commonly used exponential decay model. We also demonstrate that HERV-H loci are markedly less likely to form solo-LTRs than ERVs from other families. We conclude that the slower dynamics of HERV-H suggest a host role for the internal regions of these exapted elements and posit that in future it will be possible to use the relationship between full-length proviruses and solo-LTRs to help identify large scale co-options in distant vertebrate genomes.


Assuntos
Retrovirus Endógenos/genética , Genoma Humano/genética , Modelos Genéticos , Animais , Sequência de Bases , Sequência Conservada , Evolução Molecular , Humanos , Filogenia , Primatas/genética
18.
Genetics ; 202(4): 1449-72, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26857628

RESUMO

Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput "deep" sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this article we develop a new method for inference using HIV deep sequencing data, using an approach based on importance sampling of ancestral recombination graphs under a multilocus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different time points and missing data without extra computational difficulty. We apply our method to a data set of HIV-1, in which several hundred sequences were obtained from an infected individual at seven time points over 2 years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available.


Assuntos
Variação Genética , Infecções por HIV/virologia , HIV-1/fisiologia , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Evolução Molecular , Genoma Viral , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Modelos Estatísticos , Mutação , RNA Viral , Recombinação Genética , Seleção Genética
19.
Retrovirology ; 12: 52, 2015 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-26088204

RESUMO

BACKGROUND: Endogenous retroviruses (ERVs) are often viewed as selfish DNA that do not contribute to host phenotype. Yet ERVs have also been co-opted to play important roles in the maintenance of stem cell identity and placentation, amongst other things. This has led to debate over whether the typical ERV confers a cost or benefit upon the host. We studied the divergence of orthologous ERVs since the chimp-human split with the aim of assessing whether ERVs exert detectable fitness effects. RESULTS: ERVs have evolved faster than other selfish DNA in human and chimpanzee. The divergence of ERVs relative to neighbouring selfish DNA is positively correlated with the length of the long terminal repeat of an ERV and with the percentage of neighbouring DNA that is not selfish. ERVs from the HERV-H family have diverged particularly quickly and in a manner that correlates with their level of transcription in human stem cells. A substitution into a highly transcribed HERV-H has a selective coefficient of the order of 10(-4). This is large enough to suggest these substitutions are not dominated by drift. CONCLUSIONS: ERVs differ from other selfish DNA in the extent to which they diverge and appear to have measurable effects on hosts, even after fixation. The effects are strongest for HERV-H and suggest that the HERV-H transcriptome has recently evolved under the influence of directional selection. As there are many HERV-H loci distributed across the ape lineage, our results suggest that in future this family can be used to model the evolutionary consequences of ERV exaptation in primates and other mammals.


Assuntos
Retrovirus Endógenos/genética , Evolução Molecular , Pan troglodytes/virologia , Primatas/virologia , Animais , Aptidão Genética , Humanos , Sequências Repetitivas de Ácido Nucleico , Sequências Repetidas Terminais
20.
BMC Bioinformatics ; 16: 108, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-25888064

RESUMO

BACKGROUND: A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. RESULTS: In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. CONCLUSIONS: The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .


Assuntos
Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Modelos Estatísticos , Alinhamento de Sequência/métodos , Software , Simulação por Computador , Humanos , Incerteza
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...